Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always force tasks off the queue #553

Merged
merged 2 commits into from
Jul 4, 2018
Merged

Conversation

smurfix
Copy link
Contributor

@smurfix smurfix commented Jul 4, 2018

When queue.get() is cancelled, the task must not be left on the queue.
Otherwise an interesting crash will result when the next item is pushed.

Best demonstrated by:

    with trio.move_on_after(0.01) as timeout_scope:
        await q.get()
    assert timeout_scope.cancelled_caught
    await q.put("Test for PR #553")

When queue.get() is cancelled, the task must not be left on the queue.
Otherwise an interesting crash will result when the next item is pushed.
@codecov
Copy link

codecov bot commented Jul 4, 2018

Codecov Report

Merging #553 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #553      +/-   ##
==========================================
+ Coverage   99.28%   99.28%   +<.01%     
==========================================
  Files          89       89              
  Lines       10491    10498       +7     
  Branches      728      728              
==========================================
+ Hits        10416    10423       +7     
  Misses         58       58              
  Partials       17       17
Impacted Files Coverage Δ
trio/_sync.py 100% <100%> (ø) ⬆️
trio/tests/test_sync.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c17a38c...82be61e. Read the comment docs.

@smurfix smurfix merged commit a947f43 into python-trio:master Jul 4, 2018
@smurfix smurfix deleted the queue branch July 4, 2018 05:39
@sorcio
Copy link
Contributor

sorcio commented Jul 4, 2018

@smurfix Ouch, sorry this made you waste time :( Thanks for adding a test and providing a fix!

I'm late for review but I want to add a note: I think that cleanup line pertains in the abort function, rather than in a finally clause. The abort function is only called on cancellation, while the finally clause is in the success path. And hopefully wait_task_rescheduled() only raises on cancellation.

@smurfix
Copy link
Contributor Author

smurfix commented Jul 4, 2018

Umm, no, the finally clause is in the "in any case" path. The await will get an exception (most likely trio.Cancelled).
The proof is in the pudding, i.e. if you run the second patch without the first, the test will fail.
In fact I probably should use except BaseException instead of finally because int the success case the task is guaranteed to have been removed from the dict. That should be somewhat faster, given that .pop() is likely to raise+catch an internal KeyError.

@sorcio
Copy link
Contributor

sorcio commented Jul 4, 2018

@smurfix I meant that the finally sits in the way of the success path, because it's executed every time instead of just when it's needed. As you correctly say, this can make make the success case (slightly) slower.

What I suggest is:

def abort_fn(_):
    self._get_wait.pop(task, None)
    return _core.Abort.SUCCEEDED

value = await _core.wait_task_rescheduled(abort_fn)

@smurfix
Copy link
Contributor Author

smurfix commented Jul 4, 2018

OK, thanks for clarifying. I didn't do that on purpose because there may be other exceptions (KeyboardInterrupt, anybody?) which would trigger the same problem.

@sorcio
Copy link
Contributor

sorcio commented Jul 4, 2018

My point was stronger than that. No other exceptions may occur in a sane state :) KeyboardInterrupt cannot happen in that code, wait_task_rescheduled() on its own is not supposed to throw, and only CancelError can be sent by the task scheduler. This is not a fragile coincidence that can change without notice: it's like this by design. abort_func is the elected place for cleanup, and it was my mistake to forget about it (to be fair, an earlier version of this code used to do exactly the same, then I removed it in 7cb8e14).

@njsmith
Copy link
Member

njsmith commented Jul 4, 2018

@smurfix I see that @pquentin already reminded you about the don't-merge-your-own-PR guideline in chat. No harm done, but just want to mention it here too in case others watching get confused. As you can see, even a small and fundamentally correct fix like this still can have some fiddly bits to get right...

@sorcio is correct that abort_fn is the canonical place to do this kind of cleanup. There's even an issue open (#315) suggesting that it might be simpler to unconditionally call abort_fn when rescheduling, to reduce code duplication between the normal and cancellation wakeup paths...

Note also that get is decorated with enable_ki_protection, so the only way KeyboardInterrupt can be raised there is through the cancellation mechanism (i.e., KeyboardInterrupt can happen, but it's guaranteed to call abort_fn).

Also, in the test, that 0.01 second timeout is making me itch. I think it does deterministically work right now, but the only reason there's no race condition is because of some subtle details about how the body of get interacts with the scheduler... I think it would be clearer and more reliable to write it like:

async def test_553(autojump_clock):
    q = trio.Queue()
    with trio.move_on_after(10) as timeout_scope:
        await q.get()
    assert timeout_scope.cancelled_caught
    await q.put("Test for PR #553")

@njsmith
Copy link
Member

njsmith commented Jul 4, 2018

given that .pop() is likely to raise+catch an internal KeyError

Apparently not:

# Popping a key that's present
In [3]: %timeit d = {1: None}; d.pop(1, None)
187 ns ± 8.37 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

# Popping a key that's missing
In [4]: %timeit d = {1: None}; d.pop(2, None)
176 ns ± 3.32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants